A Method of Structure Comparison using Spatial Topological Patterns

نویسندگان

  • Sung-Hee Park
  • Keun Ho Ryu
چکیده

The problem of comparison of structural similarity has been complex and computationally expensive. The first step to solve comparison of structural similarity in 3D structure databases is to develop fast methods for structural similarity. Therefore, we propose a new method of comparing structural similarity in protein structure databases by using topological patterns of proteins. In our approach, the geometry of secondary structure elements in 3D space is represented by spatial data types and is indexed using Rtrees. Topological patterns are discovered by spatial topology relations based on the Rtree index join. An algorithm for a similarity search compares topological patterns of a query protein with those of proteins in structure databases by the intersection frequency of SSEs. Our experimental results show that the execution time of our method is three times faster than the generally known method DALITE. Our method can generate small candidate sets for more accurate alignment tools such as DALI and SSAP. Introduction The prediction of protein functions has become a hot topic and major concern in bioinformatics. Protein functions usually depend on the 3D structures of proteins. One approach for predicting protein structures is to compare new proteins with those proteins whose structures have been known by their sequence and structure similarity. The first step to solve this matter is to develop fast structure comparison algorithms. The problem of protein structure comparison is grouped into two categories: 1) pairwise protein structure comparison and 2) a similarity search toward 3D protein structure databases to find the most similar structures in databases and then predict functions based on these structures. It is much more complex and computationally expensive to compare a query protein with structures in 3D structure databases. Previous work [5, 6, 9] has proven that the comparison problem is quite complicated. Also, there is no exact solution to the protein structure alignment problem, the best solution being the heuristic algorithms used in the calculation. When performing a database search, all these methods practice exhaustive searches. The algorithms used to find the superposed substructures stop if the alignment does not change much or the iteration count exceeds some maximal values. In most cases, existing comparison systems such as DALI [6] and VAST [9] do not provide timely search results. The goal of this work is to develop a method for rapid similarity search in 3D proteins structure databases. Our approach is to adopt a Multi-step query process such that includes a filtering and refinement step in order to reduce search cost. Our method will be used as a filter step for existing sophisticate structural alignments. In this paper, we describe a method to compare topological patterns of 3D structures using spatial topological operators with the Rtree multidimensional index. We approximate SSEs to vectors and represent them with spatial types. Spatial representation of Secondary Structure Element (SSE) vectors is indexed with Rtrees and topology of proteins is derived by spatial operators. Rtrees facilitate a search algorithm that compares the topological patterns of SSE vectors in 3D structure databases. This work contributes a method of fast structure comparison that reduces the set of possible candidate sets by using fast filtering with a multidimensional index. We consider that spatial topology is preserved under topological transformation such as translation, rotation and scaling. These properties are invariant even if the shape of proteins, such as length and angle, is easily changed. Even Key Engineering Materials Vols. 277-279 (2005) pp 272-277 online at http://www.scientific.net © (2005) Trans Tech Publications, Switzerland Online available since 2005/Jan/15 All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of the publisher: Trans Tech Publications Ltd, Switzerland, www.ttp.net. (ID: 130.203.133.34-15/04/08,15:32:56) Title of Publication (to be inserted by the publisher) the computation cost for discovering patterns of topology relations is less expensive than cost for searching complete matching geometry. For this reason, we consider topological properties of proteins as a suitable property for a filter step of structure comparison. Topology of Proteins with Spatial Relationship Each application of structure analysis uses different structure levels and different features. Proteins have various representations of geometry that directly affect both the size of the input and the algorithmic complexity of the computing similarity of geometry. In this section, approximation and the representation of protein structures to spatial objects are described. Topological properties using spatial relations are also presented. Approximation of Protein Structures and Representation with Spatial Types. We represent the geometry of different levels in protein structures with spatial types. Each amino acid is approximated to a central Cα atom to which are attached a hydrogen atom, carboxyl and amino groups and a specific side chain residue. Thus, the primary structure of a protein is the ordered list of amino acids. We represent a Cα atom of each amino acid in a sequence by a spatial 3D point and handle a sequence as an ordered set of points. The most common secondary structures are the helices and sheets consisting of strands. SSEs have two ending Cα atoms and a list of Cα atoms between them. A SSE is approximated to a vector between two ending points of the SSEs. Therefore, SSEs are modeled as line segments of 10 – 20 points. Proteins could be considered as mixed sets of points and segments. The representation of SSEs reduces the size of the input data for the similarity search and facilitates fast retrieval of folds or motifs from the 3D protein structure databases at the filtering step of comparison. The atomic description of proteins can be used in the refinement step. Topology of Protein Structures. Biologically, topology refers to the three-dimensional fold. More specifically, for given spatial arrangement of SSEs, the topology describes how these elements are connected. For example, Fig.1 shows a 3D structure of a protein, while its topology is shown in Fig.2. Fig. 1. 3D structure of a protein Fig. 2. Topological Diagram human growth hormone The topology of protein structures is more diverse than that of structure levels. We group the topology of proteins into three categories: primary, secondary and tertiary topology. Primary topology includes SSE’s type and length. In Fig.2, secondary topology includes SSEs’ order along the backbone, SSE direction and SSE proximity. Tertiary topology includes the spatial arrangement of SSEs in 3D space. In terms of spatial properties, we focus on SSE proximity and the spatial topology of SSEs in 3D space. Spatial topology is represented by eight topological relations, which are defined as 9IM (Intersection Matrix) [7]. The spatial arrangement between SSEs is represented by topological relations in Fig. 3 and inferred by topological operators from coordinates of the two ending Ca atom in SSE vectors. Here, we use four major relations such as crossover, equal, touch and overlap among eight relations, and these relations are computed by using topological operators based on the join operation of the two RTree index. Key Engineering Materials Vols. 277-279 273 Title of Publication (to be inserted by the publisher) Fig. 3. Topological Relations The SSEs proximity [10] describes the nearest neighbor SSEs which satisfy the minimum distance from a given line segment encoding the SSE vector to end points of SSEs vectors under threshold distance. Proximity of SSEs to the preceding element denotes packing of SSEs. In Fig. 4, (X) denotes considered proximal if dc, db are < 12.0 Å. (Y) indicates not proximal because the nearest points to a, b, c and d are not between the secondary structure end points. Table 1. Topology string for binary topological relations Fig. 4. Proximity of a SSE to the preceding element Topology Pattern Discovery To facilitate a similarity search for databases, we discover topological patterns from all the structures in a 3D structure database which store SSEs as spatial objects. The discovered patterns are inserted into the 3D structure database. With the representation scheme described in the previous section, a discovery algorithm constructs topological pattern lists by investigating topological relations that have occurred in protein structures. A topological pattern list composes a combination of binary and n-ary topological relations sorted by SSE’s order along the backbone. Each type of the binary topological relation in the topological pattern list is mapped to each character as shown in Table 1. Thus, the topological pattern lists are those of topology strings. To find similar structures, a similarity search algorithm ultimately compares the lists of topology strings for a given query with those of structures in a database. This section denotes the construction of the topological pattern lists and how to map them to the lists of topology strings. There are fifteen possible topological relations considering SSE types to combine a binary topological relation. Given a protein P ={ S1, .... Sk}, Si and Sj are SSEs, the types of a SSE are S T = {H, S}, a set of topological relations R={overlap, touch, crossover, equal, proximity}, where 0 < i ,j < k, k is number of SSEs in Protein P, H = Helix and S= Strand. Definition 1 (binary topological relation) Let R 2 = {Si ⊙ Sj} be a binary topological relation, where Si ≠ Sj , |R|=2 (number of SSE), ⊙ ∈ R and Si, Sj ∈ {H, S}. Definition 2 (n-ary topological relation) Let R n = {(S1 ⊙ S2), (S2 ⊙ S3), ......,(Si ⊙ Sq), (Sq ⊙ Sj ),... (Sn-1⊙ Sn)} be n-ary topological relation, where if 0 < i < q < j ≤ n, the order of all SSEs is consequence in n-ary topological relation, binary topological relations R 2 must be consequence, |R| = n, ⊙ ∈ R and Si, Sj ∈ {H, S}. SSE types of Secondary Structure Topological Relation (⊙ ) Helix ⊙ Helix Helix ⊙ Strand Strand ⊙ Strand Overlap A F K Cross over B G L Equal C H M Touch D I N Proximity E J O a

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatial Structure Changes in Mohtasham Urban Area of Kashan

  Today, Mohtasham urban area in Kashan does not benefit from a suitable condition and has lost its importance over the last decades. The vast majority of the local inhabitants of this historical area have left their homes and have been replaced by weak and underprivileged dwellers of the urban community. Moreover, historical buildings have been demolished or worn out structurally, and local m...

متن کامل

Quantitative Structure-Property Relationship to Predict Quantum Properties of Monocarboxylic Acids By using Topological Indices

Abstract. Topological indices are the numerical value associated with chemical constitution purporting for correlation of chemical structure with various physical properties, chemical reactivity or biological activity. Graph theory is a delightful playground for the exploration of proof techniques in Discrete Mathematics and its results have applications in many areas of sciences. A graph is a ...

متن کامل

Investigating the Evolution of Spatial Structure Patterns of Migration in West-Azerbaijan Province

Migration and quality of redistribution of population across country is one of the main factors that planners take into account to control population and guide its movements to economic poles. Scientific recognition of migration phenomenon is important for managing and policy making in the country. These movements change migration pattern, in a way that along with increase in population and the...

متن کامل

مدل‌سازی روابط توپولوژیک سه بعدی فازی در محیط GIS

Nowadays, geospatial information systems (GIS) are widely used to solve different spatial problems based on various types of fundamental data: spatial, temporal, attribute and topological relations. Topological relations are the most important part of GIS which distinguish it from the other kinds of information technologies. One of the important mechanisms for representing topological relations...

متن کامل

Evaluation the methods of confidentiality in three Peymoon of large, small and breack in the articulation of Iranian-Islamic housing using space syntax techniques

Professor Pirnia classified house space in three spatial system, large, small and breack Peymoon which the basis of these categories include spatial scale, number and arrangement of different spaces in them. Despite the nominal researches, there is not specific research about the introduction of spatial features and patterns ruling over them. The present study follows the concept of privacy and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008